The goal for this workshop is to build on the fundamental R skills introduced in previous workshops and focus on data visualization. We will predominately use packages from tidyverse including the powerful and popular ggplot2.

Plotting

We will be using ggplot2 to create beautiful plots from our data. Graphics in ggplot2 are built in layers, allowing for customization and flexibility.

The basic template to build a plot is as follows:

ggplot(data = <DATA>, mapping = aes(<MAPPING>) + <GEOM_FUNCTION>()

The ggplot() function is first directed to the data frame that it will be referring to.

ggplot(data = data_complete)

The aesthetic function aes() selects the variables to be plotted and specifies how to present them in the graph – as x and y positions or characteristics such as size, shape, color.

Once the ggplot() function has been completed, you can save it as an object and refer to it later. For demonstaration purposes I will continue to show the full contents of the ggplot() function.

ggplot(data = data_complete, 
       mapping = aes(x = weight, y = hindfoot_length))

# Save the ggplot2 object
data_plot <- ggplot(data = data_complete, 
       mapping = aes(x = weight, y = hindfoot_length))

Lastly, geoms are the way the graphs are represented such as line, bar, and scatter plots.

ggplot(data = data_complete, 
       mapping = aes(x = weight, y = hindfoot_length)) +
  geom_point()

# Referring to the ggplot() object
data_plot +
  geom_point()

Each geom understands certain aesthetics. To learn more about a particular geom, you can use ?geom_<TYPE> to learn more. To learn more about available aesthetics, HERE is a good article.

Geom_point can be modified with the following aesthetics:

Sometimes it helps to understand the data by reducing the opacity of each individual point. Use alpha within the geom_point() function to adjust the opacity with a number between 0 and 1.

# Reducing the opacity with alpha 
ggplot(data = data_complete, 
       mapping = aes(x = weight, y = hindfoot_length)) +
  geom_point(alpha = 0.1)

Here we can change the color of the data points using color within the geom_point() function.

# Changing the color and opacity
ggplot(data = data_complete, 
       mapping = aes(x = weight, y = hindfoot_length)) +
  geom_point(alpha = 0.1, 
             color = "purple")

It can helpful to separate groups within the dataset with different colors. Using the aes() function within geom_point(), we can assign default colors to each group identified within “species_id”. A legend is automatically placed in the plotting area.

# Changing the color by "species_id"
ggplot(data = data_complete, 
       mapping = aes(x = weight, y = hindfoot_length)) +
  geom_point(alpha = 0.1, 
             aes(color = species_id))

If we want to change the data in the plot, we can indicate a different column of data to use in the ggplot() function in the mapping argument. Let’s change x to “species_id” and y to “weight”.

# Changing the color by "species_id"
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_point(alpha = 0.1, 
             aes(color = species_id))

Let’s change the plot to something more informative. A boxplot measures the points for each group and plots the median (middle line), the 25th percentile, the 75th percentile, and 1.5x the inter-quartile range (IQR).

# Changing type of plot by using `geom_boxplot()`
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_boxplot(alpha = 0.1, 
               aes(color = species_id))

Since the species are labeled we don’t need to color code the graph but you can if you want. Here we will make all boxplots purple with a white fill.

# Changing type of plot by using `geom_boxplot()`
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_boxplot(color = "purple", 
               fill = "white")

To distinguish the outliers, we can modify the opacity and color.

# Changing colors and characteristics of the outliers
# Use `?geom_boxplot` to learn more
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_boxplot(color = "purple", 
               fill = "white",
               outlier.color = "black",
               outlier.alpha = 0.1)

To help visualize the plot differently, we can modify the y-axis to a log10. This allows us to see the smaller changes and still acknowledge the larger values within the space of the plot.

# Modifying the y-axis to be on a log10 scale
# Use `?geom_boxplot` to learn more
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_boxplot(color = "purple", 
               fill = "white",
               outlier.color = "black",
               outlier.alpha = 0.1) +
  scale_y_log10()

Next, we can add labels, titles, and subtitles to our plot using the labs() function.

# Changing the labels
# Use `?geom_boxplot` to learn more
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_boxplot(color = "purple", 
               fill = "white",
               outlier.color = "black",
               outlier.alpha = 0.1) +
  scale_y_log10() +
  labs(title = "Weight of Species",
       y = "Weight (log10)",
       x = "Species")

We can change the look of our plot by using pre-loaded themes. You can find a list of available themes HERE. It is important that the theme is the last layer in your plot, otherwise it will be hidden and you cannot see it.

A common and simple theme is theme_bw().

# Changing the theme to a simple white background
# Use `?geom_boxplot` to learn more
ggplot(data = data_complete, 
       mapping = aes(x = species_id, y = weight)) +
  geom_boxplot(color = "purple", 
               fill = "white",
               outlier.color = "black",
               outlier.alpha = 0.1) +
  scale_y_log10() +
  labs(title = "Weight of Species",
       y = "Weight (log10)",
       x = "Species") +
  theme_bw()

Time Series

For these plots, we will look at the counts per year of each genus. First, we group the data by year then by genus.

annual <- data_complete %>% 
  count(year, genus)

head(annual)
## # A tibble: 6 × 3
##    year genus               n
##   <dbl> <chr>           <int>
## 1  1977 Chaetodipus         3
## 2  1977 Dipodomys         222
## 3  1977 Onychomys           1
## 4  1977 Perognathus        22
## 5  1977 Peromyscus          2
## 6  1977 Reithrodontomys     2
tail(annual)
## # A tibble: 6 × 3
##    year genus               n
##   <dbl> <chr>           <int>
## 1  2002 Neotoma            42
## 2  2002 Onychomys         126
## 3  2002 Perognathus        18
## 4  2002 Peromyscus         58
## 5  2002 Reithrodontomys    20
## 6  2002 Sigmodon            9

Now we can plot the data over time. However, the plot doesn’t look as we hoped. Our goal is to see the trend over time for each genus. The current plot looks at the total abundance over time.

ggplot(data = annual,
       mapping = aes(x = year, y = n)) +
  geom_line()

To group the data by genus so we can see the data plotted over time, we have to assign the group argument within the aes() function in our ggplot. Now you will see several lines changing over time but without any color or legend this plot is useless.

ggplot(data = annual,
       mapping = aes(x = year, y = n, group = genus)) +
  geom_line()

Another option to group the data but also assign colors is to use the argument color within aes(). This will automatically print a legend in the plot area.

ggplot(data = annual,
       mapping = aes(x = year, y = n, color = genus)) +
  geom_line()

Faceting

Sometime the data can look too complex and needs to be separated for clarity. Faceting is a helpful technique that splits one plot into multiple plots based on a factor. To use faceting, we add on an additional layer to our plot and separate the data by “genus”.

ggplot(data = annual,
       mapping = aes(x = year, y = n)) +
  geom_line() +
  facet_wrap(facets = vars(genus))

To gain more insight about the abunance change over time, we can group our data by sex in addition to year and genus. To do this, we just add our additional group to the function count().

annual_sex <- data_complete %>% 
  count(year, genus, sex)

head(annual_sex)
## # A tibble: 6 × 4
##    year genus       sex       n
##   <dbl> <chr>       <fct> <int>
## 1  1977 Chaetodipus F         3
## 2  1977 Dipodomys   F       103
## 3  1977 Dipodomys   M       119
## 4  1977 Onychomys   F         1
## 5  1977 Perognathus F        14
## 6  1977 Perognathus M         8

Now we reassign the data argument in the ggplot() function and add the argument color to the aes() function. This will produce a faceted plot with two different colored lines in each plot.

ggplot(data = annual_sex,
       mapping = aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_wrap(facets = vars(genus))

You can further customize the plot by indicating factors for rows and columns within the facet_grid() function. Here we will separate the data with the factor “sex” in each row and “genus” in the columns.

ggplot(data = annual_sex,
       mapping = aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_grid(row = vars(sex), cols = vars(genus))

By only indicating a factor for rows, you can limit the look of your plot to be in rows.

ggplot(data = annual_sex,
       mapping = aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_grid(row = vars(genus))

Similarily, you can have your data faceted by columns only.

ggplot(data = annual_sex,
       mapping = aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_grid(cols = vars(genus))

Exporting Plots

We can easily save our plots using the ggsave() function. This will allow you to customize the dimensions (width, height) and resolution (dpi) of your plot with a few simple arguments.

You must save your plot as object so that it can be used as an argument to export.

myplot <- ggplot(data = annual_sex,
                 mapping = aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_grid(cols = vars(genus))

Create a new directory called “figures” and export your plot to the new directory.

dir.create("figures")

ggsave("figures/myplot_file.pdf", myplot, width = 15, height = 10)

Arranging Plots with Patchwork

If you are publishing figures, it is helpful to combine multiple plots in a single figure. We can use the package patchwork to do this. First it must be installed and loaded into our session.

install.packages("patchwork")
library(patchwork)
# Making the first plot and saving it as an object
plot_weight <- ggplot(data = data_complete, 
                      aes(x = species_id, y = weight)) +
  geom_boxplot() +
  labs(x = "Species", 
       y = expression(log[10](Weight))) +
  scale_y_log10()

# Making the second plot and saving it as an object
plot_count <- ggplot(data = annual, 
                     aes(x = year, y = n, color = genus)) +
  geom_line() +
  labs(x = "Year", 
       y = "Abundance")

# Calling both plots at the same time
plot_weight / plot_count + plot_layout(heights = c(3, 2))

These types of figures can be exported just as single plots. Just save the combined figure as an object and refer to it within the ggsave() function.

plot_combined <- plot_weight / plot_count + plot_layout(heights = c(3, 2))

ggsave("figures/plot_combined.png", plot_combined, width = 10, dpi = 300)